4.1 General Procedure

The creation of a stratified random sampling design involves the identification of relevant features describing the environmental diversity in the area (soil and land use are the environmental variables generally used to define strata), delineation of the strata, determination of the number of samples to distribute to each stratum, followed by random sampling within it. By identifying relevant classes, combining them to define strata, and allocating an appropriate number of samples to each stratum, a representative sample can be obtained. Random sampling within each stratum helps to ensure that the sample is unbiased and provides a fair representation of the overall conditions in the area.

The first question is about how many samples must be retrieved from each strata. The sampling scheme starts with the definition of the total number of samples to collect. In this case, the determination of the sample size is a complex and highly variable process based, among others, on the specific goals of the study, the variability of environmental proxies, the statistical requirements for accuracy and confidence, as well as additional considerations such as accessibility, costs and available resources. The optimal number of samples can be determined following the method proposed in Chapter 2 of this manual. The number of samples within each stratum is calculated using an area-weighted approach taking into account the relative area of each stratum. The sampling design in this section must also comply with the following requirements:

  • All sampling strata must have a minimum size of 100 hectares.
  • All sampling strata must be represented by at least 2 samples.

This sampling process ensures the representativeness of the environmental combinations present across the area while maintaining an efficient and feasible field sampling campaign.

4.1.1 Strata creation

We must determine the kind of information that will be used to construct the strata. In this manual, we present a simple procedure to build strata based on data from two environmental layers: soil groups and land use classification data. The information should be provided in the form of vector shapefiles with associated information databases. The data on both sets often comprises a large number of categories, that would lead to a very large number of strata. Thus, it is desirable to make an effort of aggregating similar categories within each input data set, to reduce, as much as possible, the number of categories while still capturing the most of the valuable variability in the area.

The fist step is to set-up the RStudio environment and load the required packages:

We must define the number of samples to distribute in the sampling design, and the soil and land use information layers to build the strata. We also define a REPLACEMENT parameter to account for a reduction of the sampling area according to a certain area using predefined bounding-box, that can be also here defined.

We proceed with the calculation of soil groups. In this example, soil information is stored in the field TYPES. We have analysed the extent to which the information in this field can be synthesized to eliminate redundancy when creating the strata. 1

The soil classes used to build the strata are shown in Figure ??.

  # Plot final map with the aggregated soil information
  mapview(soil["USDA_CLASS"], alpha=0, homebutton=T, layer.name = "Soils")

A similar procedure is performed on the land use dataset.

Figure ?? shows the landuse classes to build the strata.

  # Plot final map with the aggregated land use information
  mapview(lc["LU"], alpha=0, homebutton=T, layer.name = "Landuse")

To create the soil-land use strata we must combine both classified datasets.

  # Combine soil and land use layers
  soil_lc <- st_intersection(soil, lc)  
  soil_lc$soil_lc <- paste0(soil_lc$USDA_CLASS, "_", soil_lc$LU)
  soil_lc <- soil_lc %>% dplyr::select(soil_lc, geometry)

Finally, to comply with the initial requirements of the sampling design, we calculate the areas of each polygon, delete all features with extent lesser than 100 has.

The final strata map is shown in Figure ??.

   # Plot final map of stratum
  mapview(soil_lc["soil_lc"], alpha=0, homebutton=T)
  #terra::plot(soil_lc["soil_lc"], border=NA, main="Strata classes")

4.1.2 Stratified random sampling

  ##soil_lc <- st_read("../soil_sampling/JAM/strata_diss.shp")
  #soil_lc <- st_cast(soil_lc, 'POLYGON')
  #target <- st_read("../soil_sampling/JAM/sampling_points.shp")
  #target <- target[target$type=="Target",]
  # Plot final map with the aggregated land use information
  mapview(soil_lc["soil_lc"], alpha=0, homebutton=T, layer.name = "Strata") + mapview(z[z$type=="Target",], color = "white", cex=3, col.regions = "tomato",layer.name = "Target Points")
  #terra::plot(soil_lc["soil_lc"], border=NA, main="Strata classes")
  #terra::points(target[target$type=="Target",])

  1. This exploratory work is a prerequisite and must be adapted specifically to each soil and land use dataset↩︎